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Abstract 

Polymorphisms that affect complex traits or quantitative trait loci (QTL) often affect multiple traits. We describe two novel 
methods (1) for finding single nucleotide polymorphisms (SNPs) significantly associated with one or more traits using a 
multi-trait, meta-analysis, and (2) for distinguishing between a single pleiotropic QTL and multiple linked QTL. The meta- 
analysis uses the effect of each SNP on each of n traits, estimated in single trait genome wide association studies (GWAS). 
These effects are expressed as a vector of signed t-values (t) and the error covariance matrix of these t values is 
approximated by the correlation matrix of t-values among the traits calculated across the SNP (V). Consequently, t'V~^t is 
approximately distributed as a chi-squared with n degrees of freedom. An attractive feature of the meta-analysis is that it 
uses estimated effects of SNPs from single trait GWAS, so it can be applied to published data where individual records are 
not available. We demonstrate that the multi-trait method can be used to increase the power (numbers of SNPs validated in 
an independent population) of GWAS in a beef cattle data set including 10,191 animals genotyped for 729,068 SNPs with 32 
traits recorded, including growth and reproduction traits. We can distinguish between a single pleiotropic QTL and multiple 
linked QTL because multiple SNPs tagging the same QTL show the same pattern of effects across traits. We confirm this 
finding by demonstrating that when one SNP is included in the statistical model the other SNPs have a non-significant 
effect. In the beef cattle data set, cluster analysis yielded four groups of QTL with similar patterns of effects across traits 
within a group. A linear index was used to validate SNPs having effects on multiple traits and to identify additional SNPs 
belonging to these four groups. 
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Introduction 

Polymorphisms that afTect complex traits (quantitative trait loci or 
QTL) may affect multiple traits. This pleiotropy is the main cause of 
the genetic correlations between traits, although another possible 
cause of genetic correlation is linkage disequilibrium (LD) between 
the QTL for different traits. A positive genetic correlation that is less 
than 1 .0 between two traits, such as weight and fatoess, implies that 
some QTL affect both traits in the same direction, but other QTL 
may affect only one trait and a small number may even affect the 
traits in the opposite direction. Identifying QTL with different 
patterns of pleiotropy should help us to understand the physiological 
control of multiple traits. Although genome wide association studies 
(GWAS) are usually performed one trait at a time, it is not 
uncommon to find that two traits are associated with SNPs in the 
same region of a chromosome. This has been described as cross 
phenotype association [1]. Resolving whether cross phenotype 
associations are due to one QTL witli pleiotropic effects or two 
linked QTL [1] has proved challenging, given the large number of 
loci that appear to cause variation in complex traits [2-5] . 



In practice, the apparent effect of a SNP on a trait is estimated 
with some experimental or sampling error. Consequently, even ff 
there is a single QTL in a region of the chromosome, the SNP 
with the strongest association may vary from one trait to another 
causing the estimated position of the QTL to vary between traits. 
If one QTL can explain the findings for the multiple traits then a 
multi-trait analysis might result in higher power to detect QTL 
and greater precision in mapping them. Multiple-trait analysis of 
linkage experiments has been reported to increase the power to 
detect QTL [6,7]. This paper investigates whether additional 
power can be extracted from a GWAS by analyzing traits together 
rather than one at a time. 

In principle, provided the computing power exists, a multi-trait 
GWAS is statistically straightforward. However, typically not aU 
subjects have been measured for all traits, and when different traits 
have been investigated in different experiments, the individual 
subject data may not even be available. Therefore we present an 
approximate, multi-trait meta-analysis that uses as data the 
estimated effects of the SNPs from n individual trait GWAS. These 
effects are expressed as a vector of signed t-values (t) and the error 
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Author Summary 

We describe novel methods for finding significant associ- 
ations between a genome wide panel of SNPs and 
multiple complex traits, and further for distinguishing 
between genes with effects on multiple traits and multiple 
linked genes affecting different traits. The method uses a 
meta-analysis based on estimates of SNP effects from 
independent single trait genome wide association studies 
(GWAS). The method could therefore be widely used to 
combine already published GWAS results. The method was 
applied to 32 traits that describe growth, body composi- 
tion, feed intake and reproduction in 10,191 beef cattle 
genotyped for approximately 700,000 SNP. The genes 
found to be associated with these traits can be arranged 
into 4 groups that differ in their pattern of effects and 
hence presumably in their physiological mechanism of 
action. For instance, one group of genes affects weight 
and fatness in the opposite direction and can be described 
as a group of genes affecting mature size, while another 
group affects weight and fatness in the same direction. 

covariance matrix of these t values is approximated by a nxn 
correlation matrix of t-values among the traits calculated across 
the SNP (V). Consequently, t'V~'t is approximately distributed as 
a chi-squared with n degrees of freedom. The meta-analysis used 
here, although approximate, appropriately models the variances 
and covariance among the t-values regardless of the overlap in 
individuals measured for the different traits. The different amount 
of information for the different traits (e.g. dififerent number of 
individuals genotyped, size of error variance relative to SNP effect) 
is accounted for in the analysis. 

To distinguish between pleiotropy and multiple, linked QTL, 
we use two diflferent analyses. Firstly, we consider whether all 
SNPs in a region do or do not show the same pattern of effects 
across traits. Secondly, we fit the most significant SNP from the 
multi-trait analysis in the model to test whether this does or does 
not eliminate the evidence for a second QTL. 

An aim of GWAS is to identify the genes and polymorphic sites 
in the genome that cause variation in complex traits. Choosing the 
most likely candidate genes from the region surrounding a SNP is 
usually based on the relationship between the function of the gene 
and the trait. Assuming that some QTL show pleiotropy, the 
pattern of pleiotropic effects would be an important clue to the 
nature of the causative mutation and the function of the gene in 
which it occurred. Genes that belong to the same pathway might 
have a similar pattern of pleiotropic effects. Therefore we 
investigate whether QTL can be clustered into groups with a 
similar pattern of pleiotropic effects and hence into physiologically 
similar groups. 

The objectives of this study were to test the power of a multi- 
trait, meta-analysis to detect and map pleiotropic QTL affecting 
growth, feed conversion efficiency, carcass composition, meat 
quality and reproduction in beef catde. We also investigate 
whether these pleiotropic QTLs can be placed in groups with a 
similar pattern of effects and hence similar underlying physiolog- 
ical mechanisms. 

Results 

Power of multi-trait meta-analysis to detect QTL 

False discovery rate. In this study, the 10,191 cattle had real 
or imputed genotyises for 729,068 SNP, although not all cattle 
were measured for all traits. The cattle were sourced from 9 



different populations (Angus, Murray Grey, Shorthorn, Hereford, 
Brahman, Belmont Red, Santa Gertrudis, Tropical composites, 
and recent Brahman crosses). Single trait genome wide association 
studies were performed for the 32 traits listed in Table 1 and the 
results have been previously reported by Bolormaa et al. [8] . The 
traits include measures at different ages of height, weight, fatness, 
muscularity, feed intake, meat tenderness, age at puberty, and 
interval of postpartum anoestrus (Table 1). 

In the multi-trait analysis we measured the effect of a SNP on 
each trait by a signed t-value (effect/standard error of effect) 
and approximated the (co)variance matrix among the traits 
using the correlation matrix of these SNP effects. Then the 
effects of a SNP across the 32 traits were combined with this 
correlation matrix to perform a multi-trait chi-squared test with 
32 degrees of freedom of the null hypothesis that a SNP has no 
effect on any trait. 

For this test, 2,028 SNPs were significant (P<5xl0"') 
(Figure 1). This corresponds to a false discovery rate (FDR) of 
0.02%, and this was lower than for any individual trait; when 
traits were analyzed individually, for 19 out of 32 traits the FDR 
at P<5xl0"' ranged from 0.04% to 2.5% and for the 13 
remaining traits FDR was greater than 3.5% (Table 2). Therefore 
the multi-trait test had greater power to detect QTL than the 
individual trait analyses. 

Validation of SNP effects. To validate the associations 
found in the multi-trait analysis, we used individual level data. The 
data were split into a discovery sample and a validation sample. 
The whole data were spht into five sets by allocating all of the 
offspring of randomly selected sires to one of the five datasets. 
Then one of the 5 divisions was randomly selected as a validation 
population and the other 4 divisions as the reference population. 
In this way no animal used for validation had paternal half sibs in 
the reference population. 

From the multi-trait analysis of the discovery dataset, the most 
significant SNPs {P< 1 0 ') were retained and then to avoid 
identifying a large number of closely linked SNPs whose 
association with traits is due to the same QTL, only the most 
significant SNP in a 1 Mb interval was selected for validation. 

For each SNP we calculated the linear index of 22 traits. This 
linear index had maximum correlation with the corresponding 
SNP. Then the association between a SNP and its correspond- 
ing linear index was tested in the validation sample. To do this 
we needed individual animals that had been measured for 
nearly all traits. However, the bulls and cows, which had been 
measured for 10 reproductive traits, were not measured for the 
other 22 traits. Therefore we based the validation on animals 
measured for the 22 non-reproductive traits and calculated the 
linear index for each SNP based on these 22 traits (Table 3). 
Out of the 244 significant SNPs, 207 or 85% had an effect in the 
same direction in the validation sample as in the discovery 
sample. The size of the validation sample (1,899 animals) limited 
its power but 72 of the 244 SNPs were significant (P<0.05) and 
71 of the 72 had an effect in the same direction as in the 
discovery sample. 

To compare the power of detecting QTL in the multi-trait 
analysis to that in the single-trait analysis, we performed the same 
validation analysis for the single trait post weaning live weight 
(PW_lwt), which is one of the traits with the highest number of 
significant associations (Table 3). For PW_lwt, only 79 SNP met 
the criterion (P< 10"^ and one SNP per Mb) and of these 60 (76%) 
had an effect in the same direction in the validation sample as in 
the discovery sample, but only 13 of the 79 were significant at P< 
0.05 in the validation sample. This shows that multi-trait analysis 
detected more associations and validated a higher percentage of 
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Table 1. Number of records, mean, standard deviation (SD), heritability estimate (h^) of each trait for the genotyped animals and 
their 5-generation ancestors.^ 





Trait ID 


No. of Animals 


Mean 


SD 




Trait name 


PW_hip 


6359 


120.5 


8.1 


0.55 


Hip height measured post weaning (cm) 


SF_hip 


1854 


131.7 


8.1 


0.51 


Hip height measured at feedlot entry (cm) 


X_hip 


2037 


139.2 


8.2 


0.36 


Hip height measured at feedlot exit (cm) 


HUMP 


1132 


139.7 


38 


0.34 


Hump height as assessed by MSA grader (mm) 


PW_lwt 


9884 


238.9 


55.6 


0.42 


Live weight measured post weaning (kg) 


X_lwt 


5992 


504.2 


95.8 


0.44 


Live weight measured at feedlot exit (kg) 


MIDWT 


1585 


89.5 


14.6 


0.72 


Metabolic mid-test weight in the RFI test period (kg°^^) 


ADG 


1936 


1.4 


0.4 


0.43 


Average daily gain over RFI test period (kg) 


RFI 


4026 


-1.4 


2.1 


0.38 


Residual feed intake (kg) 


PWIGF 


918 


276 6 


149 3 


0 37 




EIGF 


1103 


510.1 


1 86 6 


0 17 


I^F-I nno^c 1 1 rf^H tfif^ri lot f^n1"r\/ fnn /mM 


XIGF 


948 


621 


1 33 6 


0 2 




CPS 


5727 


113 


4.7 


0.39 


P8 f3t depth 3t sIdUQhtcr (mm} 


CRIB 


5464 


7.6 


4.1 


0.34 


rib fst 3t sIsuQhtor (mm) 


SP8 


4779 


g 


3.6 


0.54 


Exit scsnnGcl P8 fst depth (mm) 


SRIB 


4779 


1 1.2 


4.5 


0.52 


Exit sc3nned rib fst (mm) 


CIMF 


5824 


3.6 


2 


0 4 


Porf~ont inlT^m E f 1^ r f^l" mf^^ c 1 1 rf^H [r\ 1 onn i c^i m 1 1 c 1 1 m n/^rii m 
rcrll_d1L II 1 LI al 1 1 U^l_Li Idl laL lll^a^urtrLI III I-Lil luibjil I lu^ lUI 1 IU*JI til 1 1 

muscle (%} 


CMARB 


4228 


0.8 


0.8 


0.27 


Ausmeat marble score as assessed by MSA grader (score) 


CEMA 


1557 


75.1 


8.6 


0.43 


Eye muscle area at slaughter (cm^) 


SEMA 


4539 


68.1 


10.9 


0.17 


Exit scanned eye muscle area (cm^) 


CRBY 


2684 


67 


3.4 


0.46 


Carcass retail beef yield (%) 


LLPF 


5358 


4.5 


1 


0 3 


Pf^^K ft~irf~(^ rYM^^zi 1 rfiH in 1 r^nn ic^i m 1 1 c 1 1 m hr^ri i m m 1 1 / urt^ 
rtralx ICIv-U llltra^UlcTU III l_^l ILJlj^liriUb lUIIIUL'lUIII IIIU^l_ltr / 


SCI 2 


1112 


21.2 


2.7 


0.62 


Scrotal circumference measured at ages of 12 months (cm) 


PNS24 


964 


73.6 


22.1 


0.23 


Percentage of normal sperm at the age of 24 months (%) 


AGECL_BB 


1007 


751 


114.9 


0.56 


Age at first detected corpus luteum in BB (days) 


AGECL_TC 


1108 


650 


104.8 


0.49 


Age at first detected corpus luteum in TC (days) 


PPALBB 


629 


180 


3.8 


0.51 


Post partum anoestrus interval in BB (days), 


PPAI_TC 


863 


141 


3 


0.29 


Post partum anoestrus interval in TC (days), 


WTCL_BB 


993 


334 


42 


0.56 


Live weight measured at the age when the first corpus luteum 
in BB (kg) 


WTCL_TC 


1094 


329 


41.3 


0.46 


Live weight measured at the age when the first corpus luteum 
in TC (kg) 


P8CL_BB 


951 


4.5 


2.1 


0.47 


P8 fat depth measured at the age when the first corpus 
luteum in BB (mm) 


P8CL_TC 


1083 


3 


1.5 


0.42 


PS fat depth measured at the age when the first corpus 
luteum in TC (mm) 



= similar summary statistics for 19 of the above traits can be found in [8]. 
doi:1 0.1 371 /journal.pgen.l 0041 98.t001 



them than the best single trait analysis. For other single traits the 
proportion of significant SNPs from the reference that were 
vahdated was even lower than for PW_lwt. 

Precision of multi-trait meta-analysis for QTL mapping 

Many of the significant SNPs in both single trait and multi- 
trait analyses were linked and might be associated with the same 
QTL. As an example of the multi-trait approach to improve 
precision, Figure 2A shows the significance of SNP effects for 4 
single trait GWAS and our multi-trait statistic in a region of 
chromosome 5 (BTA 5). The 4 separate traits map the QTL to 
slightly different positions (range: 47,732-48,877 kb). For the 
multi-trait statistic, based on SNP effects from single-trait GWAS 



for 32 traits, the most significant SNP (P= 1 .32 x 10~^') was 
located at 47,728 kb. The multi-trait analysis represents a good 
compromise between the positions from the 4 single trait GWAS 
and may be the best guide to a single QTL position explaining 
all the associated traits. 



Multi-trait meta-analysis tends to find SNPs near genes 

SNPs were classified according to their distance from the 
nearest gene and the proportion of SNPs at each distance from a 
gene that were significant (P< 1 0 ') in the multi-trait analysis was 
calculated. Figure 3 shows that SNPs were more likely to be 
significantly associated with the 32 traits if they were within or less 
than 100 kb from a gene. 
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Single-trait GWAS to test pleiotropy or linkage 

There are many regions of the genome, similar to that 
illustrated in Figure 2A, where multiple traits had significant 
associations with one or more SNPs. For each SNP their estimated 
effects on each trait were expressed as a signed t-value. For each 
pair of SNPs we calculated the correlation across the 32 traits 
between their estimated effects so that SNPs with the same pattern 
of effects across traits are highly positively or negatively correlated. 
Figure 4 shows the correlation between SNPs on a region of 
chromosomes 7 and 14. All SNPs in the vicinity of 25 Mb on 
chromosome 14 are highly correlated indicating a single pleiotro- 
pic QTL in this region, corresponding to previous reports of a 
polymorphism near the gene PLAGl that affects many traits [9- 
1 1] . On chromosome 7 there are three blocks of SNPs with high 
correlations within a block and low correlations between blocks 
suggesting there are three QTL, close to 93, 95 and 98 Mb. 
The QTL at 98 Mb corresponds to a previously reported 



polymorphism in calpastatin [CAST) [12,13]. Below, we confirm 
this interpretation by fitting the most significant SNPs in the model 
and testing for additional associations. 

Conditional analyses to test pleiotropy or linkage 

Detection of pleiotropic QTL. Many highly significant 
SNPs from the multi-trait analyses were found within narrow 
regions on chromosomes (BTA) 3, 5, 6, 7, 14, 20 and 29 (Figure 1). 
If there is only a single QTL in a region and if it is perfectly tagged 
by one of the SNPs, then when this SNP is fitted in the model the 
other nearby SNPs should have no significant association with the 
phenotypes. To test this hypothesis we selected 28 'lead' SNPs 
(Table 4), representing what appeared to be 28 QTL across the 
genome. GWAS were re-performed but the SNP; (SNP;, i = 1, 2, 3, 
729068) along with the 28 lead SNPs were simultaneously 
fitted in the model, and then the multi-trait statistic was re- 
calculated for SNP; to test the effects of the SNP; across traits after 
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Table 2. Number of SNPs and their false discovery rates (%) at P<5xl0 ^ for each trait before and after fitting the 28 leading 
SNPs in the model. 



without 28 lead SNPs With 28 lead SNPs 



Trait* 


No. 


FDR 


No. 


FDR 


rW_nip 


91 2 


0.0 


62 


0.6 


SF_hip 


36 


1.0 


28 


1.2 


X_hip 


82 


0.4 


2 


17.3 


HUlVlr 


14 


2.5 


14 


2.5 


rVV_IWt 


545 


0.1 


32 


1.1 


X_lwt 


543 


0.1 


8 


4.3 


MIUW 1 


4 


8.7 




34.6 


ADG 


17 


2.0 




34.6 


RFI 


25 


1.4 


12 


2.9 


PWIGF 


267 


0.1 




34.6 


EIGF 


280 


0.1 




34.6 


XIGF 


5 


6.9 


5 


6.9 


CPS 


325 


0.1 


6 


5.8 


SP8 


24 


1 .4 


6 


5.8 


CRIB 


36 


1.0 


4 


8.7 


SRIB 


0 




0 




CIMF 


58 


0.6 


10 


3.5 


LJVIAKd 


1 


34.6 


1 


34.6 


CEMA 


0 




0 




SEMA 


0 




0 




CRBY 


36 


1.0 


7 


4.9 


LLPF 


547 


0.1 


8 


4.3 


SCI 2 


3 


1 1.5 






PNS24 


1 


34.6 






AGECL_BB 


497 


0.1 






AGECL_TC 


0 








PPALBB 


2 


17.3 






PPALTC 


10 


3.5 






WTCL_BB 


406 


0.1 






WTCL_TC 


180 


0.2 






P8CL_BB 


0 








P8CL_TC 


0 









* = empty cells are not available. 
doi:1 0.1 371 /journal.pgen.l 0041 98.t002 



fitting the 28 lead SNPs. Figure 5 shows the results for BTA 14 as 
an example. In the original multi-trait GWAS, many SNPs 
between 20 and 40 Mb on BTA 14 were significant but after 
fitting the 28 lead SNPs, which include one at 25 Mb on BTA 14, 
there were no more significant SNPs in this region than in the rest 
of BTA 14 (Figure 5 shows the lead SNP as weU as all other SNPs). 

All 28 lead SNPs remained significant in this conditional 
analysis, even after fitting the other 27, showing that each tags a 
different QTL. For instance, on BTA 7 the 2 lead SNPs at 93 and 
98 Mb remain significant as does a SNP at 95 Mb (Figure 6). This 
confirms the interpretation of the correlation analysis (Figure 4) 
that there are 3 QTL in this narrow region. The apparent effects 
of the 28 lead SNPs on the 32 traits, as estimated in the original 
single- trait GWAS, are given in Table 5 (only values with 1 1 1 > 1 
are reported). 



In some cases, a SNP close to the lead SNP remains significant 
even after fitting the 28 lead SNPs. This could be because of 
imperfect LD between the lead SNP and the causal mutation so 
that other SNP may explain some of the variance caused by the 
causal mutation in addition to the lead SNP. Alternatively, there 
may be more than one causal variant in the same gene each 
tracked by a different SNP. In fact, there were stiU many 
significant SNPs (P<5xl0 ) scattered throughout the genome 
(eg., there were 62 significant SNPs for PW_hip; Table 2) 
indicating that there are likely to be many more than 28 QTL 
affecting these 32 traits. 

Mapping of pleiotropic QTL. In Figure 2B the significance 
of SNP effects for 4 single trait GWAS in a region of chromosome 
5 is presented, when the ith SNP (SNP;, i= 1,2,3, 729068) 
along with 28 lead SNPs were simultaneously fitted in the GWAS 
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Table 3. Number of significant SNPs {P<^0~ 


in reference population that were 


also significant in the validation population. 




/Lvalue in validation 


No. of SNP 


FDR% 


%-same 


multi-trait 


0.0001 


5 


0.5 


100 


0.001 


9 


2.6 


100 


0.01 


35 


6.0 


97 


0.05 


72 


12.6 


99 


all 


244 




85 


single-trait (PW_lwt) 


0.0001 


0 






0.001 


0 






0.01 


8 


9.0 


100 


0.05 


13 


26.7 


100 


all 


79 




76 



%-same = percentage of SNPs, which have an effect in the same direction in both validation and reference sets. 
doi:1 0.1 371 /journal.pgen.l 0041 98.t003 



model. The results from this conditional analysis show that the 
lead SNP is significant (P<10~'^) for all 4 traits, but once this SNP 
is included in the model, no other nearby SNPs reach this level of 
significance for any of the 4 traits. 

Clustering of QTL with similar pattern of effects across 
traits 

For each pair of SNPs among the 28 lead SNPs, the correlation 
of their effects across the 32 traits was calculated (Figure 7). There 
are a few correlations with high absolute value, such as between 
the lead SNPs on BTA 5, 6 and 14, but most correlations are low. 
A low correlation suggests QTL with different patterns of effects 
across traits, however sampling errors in estimating SNP effects 
also reduce the absolute value of the correlation. If two QTL affect 
the same physiological pathway one might expect them to have the 
same pattern of effects and hence a high correlation. Cluster 
analysis based on effects of the SNPs across traits divided the 28 
lead SNPs into 4 loosely defined groups (Figure 7), which share 
patterns of effects across traits (although there are still diflferences 
within each group in the exact pattern of effects across traits) 
(Table 5). 

Group 1 consists of 4 lead SNPs located on BTA 5 
(BTA5_47.7 Mb), 6 (BTA6_40.1 Mb), 14 (BTA14_25.0 Mb) 
and 20 (BTA20_4-9 Mb). This group clustered as an outer branch 
separate from the other 24 lead SNPs (Figure 7), indicating that 
this group of SNPs clusters more tightly than the other groups. 
Three of these 4 SNPs were highly correlated amongst each other 
while the SNP on BTA 20 had slighdy lower correlations to the 
other 3 SNPs. Table 5 shows that these 4 SNPs have an allele that 
increases height and weight and decreases fatness, RFI and blood 
concentration of IGFl. They could be described as changing 
mature size. 

Group 2 consists of SNPs on BTA 7 (BTA7_98 Mb), 10 
(BTA10_92 Mb), 25 (BTA25_3.7 Mb), 26 (BTA26_28.0 Mb), 
and 29 (BTA29_44.8 Mb) with high correlations between 2 SNP 
on BTA 7 and 29. These SNPs have an allele that increases meat 
tenderness (i.e., decrease shear force) and fatness (i.e., marbling or 
intra-muscular fat) (Table 5). The SNPs at BTA7_98.5 Mb and 
BTA29_45.8 Mb have a large effect on shear force and map to the 
positions of known genes affecting this trait (Calpastatin and 
Calpain 1) [12,14,15]. 



Group 3 consists of 7 SNPs that are located on BTA 2 
(BTA2_25.2 Mb), 3 (BTA3_80.1 Mb), 6 (BTA6_12.7 Mb), 13 
(BTA13_34.9 Mb), 17 (BTA17_24.9 Mb), 19 (BTA19_25.1 Mb), 
and 25 (BTA25_14.5 Mb). There was weaker clustering and lower 
correlations between these SNP compared to groups 1 and 2. The 
SNPs of Group 3 have an allele that increases both fatness and 
weight but has little effect on height or IGFl (Table 5). This 
distinguishes these SNPs from those in Group 1 where the allele 
that increases weight also decreases fatness and IGFl. 

Group 4, the biggest group, consists of 12 SNPs in a loose 
cluster. Moderate correlations appeared between some SNPs on 
BTA 7 (BTA7_93.2 Mb), 9 (BTA9_100.5 Mb), 21 (BTA21_ 
0.9 Mb), 21 (BTA21_19.0 Mb), 23 (BTA23_43.9 Mb), 4 
(BTA4_77.6 Mb) and 8 (BTA8_59.2 Mb) (Figure 7). This group 
has an allele that tends to increase muscling, retail beef yield 
(RBY), tenderness and feed efficiency, and decrease fatness. The 
clustering did separate the 2 SNPs on BTA 7 with the SNP near 
98 Mb belonging to Group 2 and the SNP near 93 Mb belonging 
to Group 4. 

Although the SNPs within a group share some features they also 
differ in some of their associations. For instance, in Group 1 the 
SNP on BTA 14 near PLAGl has a more marked effect on age at 
puberty (AGECL) than others in the group; the SNP on BTA5 
changes the distribution of fat between the P8 site on the rump 
and rib site and the intramuscular depot. Thus it is possible for the 
each SNP to have a unique pattern of associations with phenotypic 
traits. 

Finding additional QTL in the same pathway 

The pattern of pleiotropic effects might be an important clue to 
the nature of the causative mutation and the function of the gene 
in which it occurred. Genes that operate in the same pathway 
might be expected to show the same pattern of pleiotropic effects. 
For each of the 28 lead SNPs, we searched for additional SNPs 
with a similar pattern of effects. To do this we used the linear 
index of 22 traits that showed the highest association with a lead 
SNP, as previously defined for validation of the multi-trait analysis, 
and performed a new GWAS using the linear index as a new trait. 
Table 6 shows the number of significant (P< 10 ') SNPs for the 28 
linear indexes corresponding to the 28 lead SNPs. Out of 28 linear 
indexes, 19 had more than 70 significant SNPs and hence a FDR 
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Figure 2. A: The — logio(/'-values) of single SNP regressions for 4 traits and multi-trait chi-squared statistic on a region of BTA 5; B: 
TKie — logio(^-values) of single SNP regressions for 4 traits when SNPi along with 28 lead SNPs were simultaneously fitted in the 
GWAS model. 
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of less than 10%. Linear index BTA5_47.7 Mb, BTA6_40.1 Mb, 
BTA14_25.0 Mb, and BTA20_4.9 Mb (where the name corre- 
sponds to the location of the QTL defining the pattern of elfects) 
have associations with over 1,000 significant SNPs across the 
genome. For the index based on the lead SNP BTA5_47.7 Mb, the 
significant SNPs included 615 SNPs on BTA 5, 64 on BTA 6, 24 on 



BTA 11, 907 on BTA 14, 19 on BTA 17, 18 on BTA 20. This 
reiterates the result obtained in the cluster analysis because SNPs on 
BTA 5, 6, 14 and 20 are the lead SNPs in Group 1 and the 
additional SNPs on these chromosomes may be tagging the same 
QTL as the lead SNPs. However, there are also significant SNPs 
associated with this linear index on BTA 11, 17, 19, 21 and 25. 
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The additional significant SNPs were assigned to the 4 groups as 
follows. For each SNP, the linear index with which it showed the 
most significant association (P<5 x 10~ ) was found. The SNP was 
then assigned to the same group as the lead SNP defining that 
linear index. The results are shown in Figure 8. Usually this 
procedure identified a set of closely linked SNPs, presumably 
indicating a single QTL. Therefore we kept in the final group only 
the most significant SNP (P<5 x 10 ) from each set. The number 
of significant SNPs assigned to each of the 4 Groups were as 
follows: 1) 2,076; 2) 398; 3) 169 and 4) 176. The positions or 
regions of the most significant SNPs in the expanded groups are 
listed in Table 7. 

Candidate genes 

For each SNP or group of SNPs in Table 7 we examined the 
genes within 1 Mb and, in some cases, identified a plausible 
candidate for the phenotypic effect (Table 7). Focusing on those 
regions with multiple SNPs, the genes CAPJVl, CAST, and PLAGl, 
were again identified, which are strongly identified with meat 
quality and growth in previous catde studies [16-18]. In addition, 
we identified the genomic regions that include the HMGA2, LEPR, 
DAGLA, ZEBl, IGFBP3, FGF6 and ARRDC3 genes as having 
strong genetic effects in cattle. HA'IGA2 and LEPR are well known 
to have effects on fatness and body composition in pigs [19,20]. 



SNP in the promoter of IGFBP3 have been shown to affect the 
level of IGEBP3 in humans, which affects availability of circulating 
IGF 1 and has a multitude of effects on growth and development 
[21]. Here we show a strong effect for IGFBP3, where previous 
results for marbling or backfat have either been small or non- 
significant [22,23]. Differences in gene expression of FGE6 has 
been shown to be associated to muscle development in cattie [24], 
and here we show that genetic variation at EGF6 is associated with 
effects on Group 4 traits, which include muscling and yield traits. 
ARRDC3 is a gene involved in beta adrenergic receptor regulation 
in cell culture [25], and beta adrenergic receptor modulation is 
involved in tenderness, growth and muscularity in cattle [26,27]. 
Here we show that variation at ARRDC3 is strongly associated with 
growth and muscularity traits in these cattle. 

Discussion 

We demonstrated that our multi-trait analysis has a lower FDR 
than any one single trait analysis (at the same significance test P- 
value) and that these SNPs are more likely to be vahdated in a 
separate sample of animals. The most significant SNP in the multi- 
trait analysis provides a consensus position across the traits affected 
and a consistent set of estimates of the QTL for the various traits. 
This is in contrast to single trait analyses that often report the effect 
of different SNPs on each trait while neglecting the pattern of 
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Figure 4. Correlations between pairs of the SNP effects on 32 traits. A: Correlations on BTA7 from 93 Mb to 99 Mb. Three blocks of SNPs 
with high correlations within a block and low correlation between blocks are shown in blue. B: Correlations on BTA 14 near 25 Mb. The blue line 
shows the SNPs closest to the PLAGl gene. 
doi:1 0.1 371/journal.pgen.1 0041 98.g004 



effects of the QTL across traits. For instance, the multi-trait 
analysis makes it clear that the QTL in Group 1 increase weight 
and decrease fatness whereas QTL in Group 3 increase both. 

Other methods are available for multi-trait analysis [1], but the 
method used here has advantages. It can and has been applied to 
data where the individuals measured for different traits are 
partially overlapping and where the individual level data are not 
available. It utilises the estimated effects of the SNPs as well as the 
P-values and takes account of traits where the effects of a SNP 
may be in opposite directions. An alternative approach is 
illustrated by Andreasson et al. [28] in which only SNPs that 
are significant for one trait are tested for a second trait. However, 
this approach is only applicable when different individuals have 
been recorded for each trait and does not generalise easily to 
more than 2 traits. 

Ideally, in the multi-trait analysis, the matrix V (the correlation 
matrix among the SNP effects) would contain the covariances 
among the errors in the estimates of SNP effects. The error 
variance of a t-value with lOOO's of degrees of freedom is very close 
to 1.0. Our approximation to V also has diagonal elements of 1.0 
because it is a correlation matrix. The covariance between the 
errors in t-values for two different traits depends on the overlap in 
individuals measured for the two traits. If the two traits are 
recorded on different individuals, there is no covariance among 
the errors; whereas if the two traits are measured on the same 
individuals, the error covariance will be mainly determined by the 
phenotypic correlation between the traits because single SNPs 
explain httle of the phenotypic variance. We approximate these 
error covariances by the correlation between t-values across 
729,068 SNPs. Since most SNPs have littie association with a 
given trait, these correlations represent phenotypic correlations in 
the case where both traits are measured on the same individuals. If 
the two traits are measured on different individuals, then the 
correlation of t-values is close to zero as it should be. And if there is 
a partial overlap between the individuals measured for the two 



traits, then the correlation of t-values will represent this. Thus the 
meta-analysis used here, although approximate, appropriately 
models the variances and covariances among the t-values 
regardless of the overlap in individuals measured for the different 
traits. Therefore we hope it will be widely useful including in the 
analysis of published GWAS results where only the effect of each 
SNP and its standard error are available. 

Bolormaa et al. [4] carried out a multi-trait GWAS by 
performing a principle component analysis of the traits and then 
single trait GWAS on the uncorrelated principle components. The 
final test statistic was a sum of the individual principle component 
chi-squared values. The analysis used in the current paper gives 
very similar results to those of Bolormaa et al. [4] but the previous 
method requires that individual data is available and that all 
individuals are measured for all traits. 

We distinguished between two linked QTL and one QTL with 
pleiotropic effects using two types of evidence. When one QTL 
explains the results, the SNPs in the region are highly correlated in 
their effects across traits and when the best SNP is fitted in the 
model the significance of the effects of the other SNPs drops 
markedly as illustrated by the results for BTA 14 in Figure 5. 
Conversely, when there are two or more QTLs in a small region, 
such as BTA7_93-98 Mb, the SNPs show low correlations across 
traits and are still significant after the most significant is included 
in the model (Figure 6). There are few reports in the literature that 
aim to distinguish between linked and pleiotropic QTLs [29-31]. 
Karasik et al. [29] and Olsen et al. [30] conclude that pleiotropy 
exists if the same SNP or QTL region affects both traits. David et 
al. [31]'s method uses only two traits but, like ours, is based on the 
correlation between SNP effects. 

Pleiotropy of individual QTL contributes to the genetic 
correlation between traits. If two traits have a high and positive 
genetic correlation it implies that most QTL affect them both in 
the same direction. For instance, most SNPs with significant effects 
affect height and weight in the same direction and thus help to 
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Table 4. Description of the 28 lead SNPs and their P values In the genome wide association studies (GWAS). 




Group^ 


SNP order 


SNP name 


BTA^ 


Position 


P value 


Linear index^ 


1 


1 


BovineHD2000001543 


20 


4873556 


l.lE-16 


BTA20_4.9 Mb 


1 


2 


BovineHD1400007259 


14 


25015640 


l.lE-16 


BTA14_25 Mb 


1 


3 


BovineHD0500013788 


5 


47727773 


1.1E-16 


BTA5_47.7 Mb 


1 


4 


BovineHD0600010976 


6 


40093712 


l.lE-16 


BTA6_40.1 Mb 


2 


5 


BovineHD2600007456 


26 


28012143 


2.4E-08 


BTA26_28.0 Mb 


2 


6 


BovineHD25000008a2 


25 


3747518 


2.0E-06 


BTA25_3.7 Mb 


2 


7 


BovineHD1000026655 


10 


92188144 


7.9E-08 


BTA10_92.2 Mb 


2 


8 


BovineHD0700028765 


7 


98540675 


1.1E-16 


BTA7_98.5 Mb 


2 


9 


BovineHD2900015063 


29 


44837096 


l.lE-16 


BTA29_44.8 Mb 


3 


10 


BovineHD0300023058 


3 


80105316 


3.2E-15 


BTA3_80.1 Mb 


3 


n 


BovineHD1700007012 


17 


24884021 


2.7E-09 


BTAl 7_24.9 Mb 


3 


12 


Hapmap42512-BTA-32321 


13 


34909187 


5.0E-1 1 


BTA13_34.9 Mb 


3 


13 


BovineHD0200007253 


2 


25222940 


5.9E-06 


BTA2_25.2 Mb 


3 


14 


BovineHD2500004094 


25 


14547288 


1.7E-11 


BTA25_14.5 Mb 


3 


15 


BovineHD0600003225 


6 


12748745 


1.3E-08 


BTA6_1 2.7 Mb 


3 


16 


BovineHD1900007319 


19 


25052604 


2.3E-08 


BTA19_25.1 Mb 


4 


17 


BovineHDl 70001 7438 


17 


61227950 


2.0E-10 


BTAl 7_61 .2 Mb 


4 


18 


BovineHD0800017674 


8 


59156184 


3.3E-08 


BTA8_59.2 Mb 


4 


19 


BovineHD0400021462 


4 


77561148 


8.6E-1 1 


BTA4_77.6 Mb 


4 


20 


BovineHDl 30001 8707 


13 


65917704 


3.1 E-07 


BTA13_65.9 Mb 


4 


21 


BovineHD1200013180 


12 


47984330 


6.9E-1 1 


BTA12_48.0 Mb 


4 


22 


BovineHD0900004919 


9 


18195454 


8.6E-07 


BTA9_1 8.2 Mb 


4 


23 


BovineHDl 50001 6882 


15 


58463005 


6.3E-10 


BTA15_58.5 Mb 


4 


24 


BovineHD2300012740 


23 


43919433 


8.9E-1 1 


BTA23_43.9 Mb 


4 


25 


BovineHD2100000105 


21 


898385 


1.6E-05 


BTA21_0.9 Mb 


4 


26 


BovineHD21 00005354 


21 


19018980 


7.1 E-1 1 


BTA21_19.0 Mb 


4 


27 


BovineHD0900029140 


9 


100532649 


2.2E-09 


BTA9_100.5 Mb 


4 


28 


BovineHD0700027239 


7 


93244933 


1.1E-16 


BTA7_93.2 Mb 



^ = Group of the lead SNPs that were clustered together as shown on Figure 7. 

^ = Bos taurus chromosome number. 

^ = 28 linear indexes corresponding to the 28 lead SNPs. 
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explain the known high genetic correlation between these two 
traits [32]. Past research [33,34] has also found a positive genetic 
correlation between meat tenderness and marbling or intra- 
muscular fat. Consistent with that we found that SNPs that 
increase tenderness (decrease LLPF) usually increase marbling 
(CMARB or CIMF; Table 5). Similarly, SNPs that increase IGFl 
concentration nearly always decrease age at puberty explaining 
the negative genetic correlation between these traits. A low genetic 
correlation between two traits might imply that they are controlled 
by different QTL but it could also indicate some QTL affect them 
in the same direction and some in opposite directions. For 
instance, a low genetic correlation between weight and fatness [34] 
could be explained by the fact that some QTL affect weight and 
fatness in the same direction (Group 3) whereas others affect them 
in opposite directions (Group 1). 

Some significant SNPs map near to already known genes with 
effects on the traits studied, such as calpain 1, calpastatin and 
PLAGl. In other cases there are candidates that are homologous to 
known genes affecting growth and composition in other species 
(e.g., HMGA2). However, there are QTL in Table 7 for which we 
could find no obvious candidate in cattle. 



We defined 4 groups of SNPs by a cluster analysis of the 28 lead 
SNPs such that SNPs within a group have a somewhat similar 
pattern of effects across traits. These groups were expanded by 
including SNPs whose effects were correlated with those of one of 
the lead SNPs in the group. If the 4 groups of QTL represent 
different physiological pathways, one might expect the genes that 
map near the QTL of a group to show some similarity of function. 
To an extent this is so. Group 2 SNPs, which are associated with 
tenderness, include SNPs near calpain 1 [CAPJVl) and calpastatin 
(CAST) that affect tenderness via muscle fibre degradation [12-15]. 
Other SNPs in group 2 are close to genes involved in fat 
metabolism (acyl-CoA synthetase and fatty acid desaturase). This 
may be coincidental but there is a known genetic correlation 
between intra-muscular fat and tenderness [34] and SNPs in group 
2 tend to affect both traits (Table 5). 

Of the SNPs in Group 1, one on BTA 14 probably tags PLAGl, 
the 2 SNPs on BTA 5 are near HMGA2 and IGFl, respectively, die 
SNP on BTA 21 is near PLIN, the SNP on BTA 6 is near CCKAR 
and within 2 Mb of NCAPG, all of which have been reported to 
affect size in other species [35-41]. The mechanism by which they 
do this is uncertain. HMGA2 is a transcription factor needed to 
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Figure 5. The — logio(^-values) of the multi-trait test calculated using SNP effects from the single-trait GWAS for 32 traits on BTA 14 
before (A) and after (B) fitting 28 lead SNPs in the model. In (B) the significance of the lead SNP is also given after fitting the other 27 lead 
SNPs. 
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prevent stem cells from differentiating and thus a polymorphism in 
it could affect growth prior to terminal differentiation. IGFl is the 
growth factor that mediates the effect of growth hormone. PLAGl 
is a transcription factor thought to regulate expression of IGFl, 
which is important in growth. PLIM encodes a growth factor 
receptor-binding protein that interacts with insulin receptors and 
insulin-like growth-factor 1 receptor [IGFIR). PLFV is required for 
maximal liposis and utilization of adipose tissue [42] . 

Group 3 SNPs affect fatness and the SNP on BTA 3 are in the 
leptin receptor gene {LEPR), the SNP on BTA 13 is near LPIIVS 
(which regulates fatty acid metabolism), the SNP on BTA 21 is 
again near PLIN indicating that this QTL has similarities to both 
groups 1 and 3 (Table 7). LEPR is a receptor for leptin and is 
involved in the regulation of fat metabolism. It is known that leptin 
is an adipocyte-specific hormone that regulates body weight and 
plays a key role in regulating energy intake and expenditure. 
Other Group 3 SNPs were near genes that encode muscle proteins 



such as myosin and actin, which are involved with muscle 
contraction (e.g., myotUin on BTA 7 encodes a cytoskeletal protein 
which plays a significant role in the stability of thin filaments 
during muscle contraction). We do not know which, if any, of these 
genes contain causal mutations but it seems likely that the QTL 
within each group are somewhat heterogeneous. This would not 
be surprising given the complexity of feedback mechanisms of 
growth of mammals. It may be that changes to either muscle or fat 
growth indirectly affect growth of the other tissue. 

However, even QTL that have a similar pattern of pleiotropic 
effects, show differences in the detail of this pattern. For instance, 
the Group 1 QTL might all be described as affecting 'mature size', 
but the one on BTA 14, which is presumably PLAGl [9,1 1], has a 
greater effect on reproductive traits than the others in Group 2. 
On the other hand, the QTL on BTA 5 has an unusual pattern of 
effects in that it redistributes fat from the P8 site on the rump to 
the rib and intramuscular depots. This QTL maps close to the 
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Figure 6. The — logio(^-values) of the multi-trait test calculated using SNP effects from the single-trait GWAS for 32 traits on BTA 7 
before (A) and after (B) fitting 28 lead SNPs in the model. In (B) the significance of the lead SNP is also given after fitting the other 27 lead 
SNPs. 
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gene HMGA2, which contains polymorphisms afTecting growth, 
fatness and fat distribution in humans, mice, horse, and pigs 
[35,36,38,39], 

Based on these results, it would appear that, although QTL can 
be put in meaningful groups, each QTL has its own pattern of 
effects. For instance, PLAGl might be described as a gene affecting 
mature size but with additional effects on reproduction, while 
HMGA2 affects mature size and fat distribution. This could be 
explained if genes exist in a network rather than in pathways. 
Then each gene has a unique position in the network and 
therefore a unique pattern of effects. In addition, many genes 
occur in multiple networks in which they can have different 
functions. 

Beef cattle breeders seek to change the genetic merit of their 
cattle for many of the traits studied here. The pattern of effects of 
each QTL indicates that some would be more useful for selection 
than others. Some QTL have desirable effects on one trait but 



undesirable effects on other traits. For instance. Brahman breeders 
have evidently selected for the allele of PLAGl that increases 
mature size [1 1], but this has decreased the fertility of their cattle. 
On the other hand, some QTL have an allele with desirable effects 
on more than one trait and appear to be good targets for selection. 
For instance, the QTL on BTA 4 has an allele that increases retail 
beef yield and marbling but also decreases sub-cutaneous fat, 
which is a highly valuable pattern. Selection for this allele would 
be beneficial in cattle intended for most markets because cattle 
prices reflect yield and intramuscular fat scores, whereas 
subcutaneous fat generally enters the by-product stream. 

In conclusion, we have used a novel multi-trait, meta-analysis to 
map QTL with pleiotropic effects on 32 traits describing stature, 
growth, and reproduction. The distinctive features of the method 
are 1) increased power to detect and map QTL and 2) use of 
summary data on SNP effects when individual level data are not 
available. We have also presented two methods (one new) for 
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Figure 7. Correlation matrix between the 28 lead SNPs calculated from SNP effects on 32 traits (reordered for constructing a 
dendrogram). 

doi:1 0.1 371 /journal.pgen.1 0041 98.g007 



distinguishing between linked and pleiotropic QTL (tlie correla- 
tion between SNP effects across traits and the effects of one SNP 
conditional on the effect of another SNP), and found pleiotropic 
QTL which appear to cluster iiito 4 functional groups based their 
trait effects. We used linear indices of 22 traits 1) to validate the 
effects of SNPs on multiple traits and 2) to find additional QTL 
belonging to the 4 functional groups. We identified candidate 
genes in those groups that have known biological functions 
consistent with the biology of the traits. 

Materials and Methods 

Animal Care and Use Committee approval was not obtained for 
this study because no new animals were handled in this 
experiment. The experiment was performed on trait records and 
DNA samples that had been collected previously. 

SNP data 

In total, 729,068 SNP data were used in this study. Those SNP 
were obtained from 5 different SNP panels: the lUumina HD 
Bovine SNP chip (http://res.illumina.com/documents/products/ 
datasheets/datasheet_bovinehd.pdf) comprising 777,963 SNP 
markers; the BovineSNPSOK version 1 and version 2 BeadChip 
(lUumina, San Diego) comprising 54,001 and 54,609 SNP, 
respectively; the IUuminaSNP7K panel comprising 6,909 SNP; 
and the ParaUeleSNPlOK chip (Affymetrix, Santa Clara, CA) 
comprising 1 1,932 SNP. All SNP were mapped to the UMD 3.1 
buUd of the bovine genome sequence assembled by the Centre for 



Bioinformatics and Computational Biology at University of 
Maryland (CBCB) (http://www.cbcb.umd.edu/research/ 
bos_taurus_assembly.shtml. High density SNP genotypes were 
imputed for all animals using Beagle (Browning and Browning, 
2011). The approaches used for performing quality control and 
imputation were described in [8] . The details of the quality control 
and imputation were recapitulated below. 

Stringent quality control procedures were applied to the SNP 
data of each platform. SNP were excluded if the call rate per SNP 
(this is the proportion of SNP genotypes that have a GC (lUumina 
GenCall) score above 0.6) was less than 90% or they had duplicate 
map positions (two SNP with the same position but with different 
names) or an extreme departure from Hardy-Weinberg equilib- 
rium (e.g., SNP in autosomal chromosomes with both homozygous 
genotypes observed, but no heterozygotes). Furthermore, if the call 
rate per individual was less than 90%, those animals were removed 
from the SNP data. The SNP data were edited within breed group 
and within each platform and were subsequently combined. 

After all the quality control tests were applied, 729,068 SNP of 
the HD SNP chip were retained on 1,698 animals and the missing 
genotypes were fiUed using the BEAGLE program [43]. Imputa- 
tion was done using 30 iterations of BEAGLE. The genotypes for 
each SNP were encoded in the top/ top lUumina A/B format and 
then genotypes were reduced to 0, 1, and 2 copies of the B aUele. 
The imputations of the 7 K, 10 K and 50 K SNP genotype data 
to the 729 068 SNPs were performed in two sequential stages: 
from 7 K or 10 K or 50 K data to a common 50 K data set and 
then from the common 50 K data set to 800 K data. In the first 
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Table 6. Total number of significant SNPs {P<10 their FDR (%), and number of significant SNP on each chromosome (which is 
in parenthesis) for the 28 linear indexes corresponding to the 28 lead SNPs. 




Group^ SNP^ order 


Linear Index^ 


Total No. sig. SNP" 


FDR (%) 


Number of significant SNPs^ 
(chromosome number) 


1 1 


BTA20 4.9 Mb 


1313 


0.5 


69 (4), 21 (5), 40 (6), 60 (7), 8 (10), 1036 (14), 6 
(18), 57 (20) 


1 2 


BTA14_25 IVlb 


2371 


0.3 


6 (4), 180 (5), 159 (6), 9 (8), 1962 (14), 8 (17), 
20 (20), 5 (21) 


1 3 


BTA5_47.7 IVlb 


1668 


0.4 


615 (5), 64 (6), 24 (11), 907 (14), 19 (17), 18 


1 4 


RTAfi 4n 1 Mh 


1552 


0.4 


6 f41 1R4 143 ffi^ S f1 1 ^ 1 1 54 f14^ 13 
(18), 18 (20), 6 (21), 6 (25) 


2 5 


BTA26_28.0 IVlb 


47 


14.7 


8 (6), 8 (14), 11 (18), 5 (20), 6 (26) 


2 6 


BTA25_3.7 Mb 


50 


13.8 


9 (9), 7 (13), 10 (21), 13 (29) 


2 7 


BTA1C_92.2 Mb 


378 


1.8 


27 (7), 6 (10), 127 (14), 204 (29) 


2 8 


BTA7_98.5 Mb 


343 


2.0 


64 (7), 264 (29) 


2 9 


BTA29_44.8 Mb 


559 


1.2 


5 (1), 61 (7), 25 (10), 458 (29) 


3 1 0 


BTA3 80.1 Mb 


175 


4.0 


5 (1), 130 (3), 19 (13), 5 (21) 


3 1 1 


BTA1 7 24.9 Mb 


173 


4.0 


23 (1), 63 (3), 1 1 (7), 11 (13), 53 (17) 


3 12 


BTA1 3 34.9 Mb 


131 


5.3 


109 (13), 6 (21) 


3 13 


RTA? 2S 2 Mh 


27 


25.6 




3 14 


RTA7S 14 S Mh 


26 


26.6 




3 15 


BTA6 1 2.7 Mb 


70 


9.9 


20 (1), 21 (7), 10 (17) 


3 1 6 


BTA19 25.1 Mb 


~j\ 


9.8 


41 (7), 1 1 (10), 10 (19) 


4 17 


BTAl 7 61 .2 Mb 


667 


1.0 


5 (5), 18 (6), 6 (8), 595 (14), 6 (17), 20 (20), 5 
(25) 


4 18 


BTA8_59.2 Mb 


47 


14.7 


22 (8), 12 (16) 


4 1 9 


BTA4 77.6 Mb 


723 


1.0 


38 (4), 7 (6), 667 (14) 


4 20 


BTAl 3 65.9 Mb 


41 


16.9 


S (41 7 fSl S f13) S fi (2^) 


4 21 


BTA12_48.0 Mb 


12 


57.7 




4 22 


BTA9_18.2 Mb 


310 


2.2 


8 (7), 294 (14) 


4 23 


BTA15_58.5 Mb 


93 


7.4 


7 (6), 45 (7), 15 (10), 8 (14), 5 (IS) 


4 24 


BTA23_43.9 Mb 


51 


13.6 


10 (7), 11 (20), 12 (22), 6 (23) 


4 25 


BTA21_0.9 Mb 


75 


9.2 


20 (1), 34 (7) 


4 26 


BTA21_19.0 Mb 


197 


3.5 


18 (1), 86 (5), 7 (6), 26 (7), 16 (11), 5 (18), 7 
(20), 22 (21) 


4 27 


BTA9_100.5 Mb 


55 


12.6 


32 (7), 1 0 (9) 


4 28 


BTA7_93.2 Mb 


268 


2.6 


8 (6), 121 (7), 14 (9), 98 (14), 9 (20) 


' = Group of the lead SNPs that were clustered together as shown on Figure 7. 
^ = This SNP order refers SNPs, which are given on Table 4. 
^ = 28 linear indexes corresponding to the 28 lead SNPs. 

'' = Total number of significant SNPs which are significantly (P<10"^) associated with each of linear indexes. 

^ = only presented if number of significant SNPs on each chromosome (P<10"^) is more than four. Chromosome number is in parentheses. 
doi:1 0.1 371 /journal.pgen.1 0041 98.t006 



Stage imputation was done within breed, using 30 iterations of 
Beagle. In the second stage, the HD genotypes of each breed type 
(501 B. taurus and 520 B. indicus) were used as a reference set to 
impute from the 50 K genotypes of each pure breed within the 
corresponding breed type. For the four composite breeds, all the 
HD genotypes (1,698) were used as a reference set to impute the 
50 K genotypes of each composite breed up to 800 K. The 
number of genotypes for each platform used as reference animals 
for imputation and number of animals used in this study is given in 
Table 8. The mean values, for the accuracy of imputation 
provided by BEAGLE, are in Table 9. After imputation, an 
additional quality control step was apphed based on comparing 
allele frequencies between SNP platforms to detect SNP with very 



different allele frequencies indicating incorrect conversion between 
platforms. In total, 10,191 animals, which had a record for at least 
one trait and also had SNP genotypes, were used in this study. 

Animals and phenotypes 

The catde were sourced from 9 different populations of 3 breed 
types. They include 4 different Bos taurus (Bt) breeds (Angus, Murray 
Grey, Shorthorn, Hereford), 1 Bos indicus (Bi) breed (Brahman 
catde), 3 composite (BtxBi) breeds (Belmont Red, Santa Gertrudis, 
Tropical composites), and 1 recent Brahman cross population (Fl 
crosses of Brahman with Limousin, Charolais, Angus, Shorthorn, 
and Hereford). Details on population structure of those animals 
have previously been described by Bolormaa et al. [8] . 
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Figure 8. The positions of the best SNPs (5x10 ^) that are highly correlated with each group of linear index. 

doi:1 0.1 371/journal.pgen.1 0041 98.g008 



Phenotypes for 32 difTerent traits including growth, feed intake, 
carcass, meat quality, and reproduction traits were collated from 5 
different sources: The data sources included the Beef Co-operative 
Research Centre Phase I (CRCI), Phase II (CRCII), Phase III 
(CRCIII), the Trangie selection lines, the Durham Shorthorn 
group (the detailed description is reported by Bolormaa et al. [8] 
and Zhang et al. [44]. Not all cattle were measured for all traits. 
The trait definitions, number of records for each trait and 
heritability estimate and mean and its SD of each trait are shown 
in Table 1. 

Single-trait GWAS 

Mbced models fitting frxed and random effects simultaneously 
were used for estimating heritabilities and associations with SNP. 
Variances of random effects were estimated in each case by 
REML. The estimates of heritability were calculated based on all 
animals with phenotype and genotype data and their 5-generation- 
ancestors using the following mixed model: trait ~ mean + fixed 
effects -I- animal -I- error; with animal and error fitted as random 
effects. The individual animal data for the 32 traits were used to 
perform genome wide association studies (GWAS), in which each 
SNP was tested for an association with the trait. The association 



between each SNP and each of the traits was assessed by a 
regression analysis using the ASReml software [45]. The model 
used was the same as for estimating heritability, but SNPi (SNP;, 
i = 1 , 2, 3, . . . , 729068) was additionally fitted as a covariate one at 
time (trait ~ mean + frxed effects + SNP; + animal + error). The 
model used to analyse the traits consistendy included dataset, 
breed, cohort and sex as fixed effects. Other fixed effects varied by 
trait. The fixed effects were fitted as nested within a dataset. 
Further details of the models used in the analysis are reported by 
Johnston et al. [46], Reverter et al. [47], Robinson and Oddy [48], 
Barwick et al. [49], Wolcott et al. [50], Bolormaa et al. [8], and 
Zhang et al. [44]. 

Multi-trait meta-analysis chi-squared statistic 

We apphed a new statistic to find the significance level of SNPs 
in a multi-trait analysis. This statistic determines the importance 
of the effects of SNP; (i= 1, 2, 3, 729068) across aU (32) traits 
studied. Our multi-trait test statistic is approximately distributed 
as a chi-squared with 32 degrees of freedom. It tests a null 
hypothesis stating that the SNP does not affect any of the traits. 
For each SNP, the multi-trait statistic was calculated by the 
formula: 
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Table 7. Positions of plausible candidate genes identified in Bos taurus that were within 1 Mb of the significant SNPs of each 
group. 





Plausible candidate genes 


Name of most sig. SNP 


BTA 


Position (bp)^ 


No. SNP^ 


Gene 


Gene name 


BTA 


Start and end position 


Group 7 


BovineHD0400000122 


4 


665622 


1 










BovineHD0400 000284 


4 


1261978 


1 










BovineHD0400 000645 


4 


2702626-2722628 


3 










BovineHD0400 001481 


4 


5191680-5447479 


5 


BT.24356 


growth factor receptor- 
bound protein 10 


4 


5104899_5244348 


BovineHD0400 002366 


4 


7108902-7669014 


4 










BovineHD0400 002703 


4 


8722920-8745104 


2 










BovineHD0400 003043 


4 


9218984-9965927 


5 










BovineHD0500010279 


5 


35255792-35896741 


7 


AN06 


anoctamin 6 


5 


35059697_35 150452 


BovineHD0500035418 


5 


46889976-48889976 


141 


HMGA2 


High Mobility Group AT- 
Hook 2 


5 


48053846_481 99963 


BovineHD0500018592 


5 


66386971-66484917 


4 


ICF-I 


insulin-like growth factor 
1 (somatomedin C) 


5 


66532877_66604734 


BovineHD0600010976 


6 


39093712-41093712 


23 










BovineHD0800024885 


8 


83575973-83693221 


3 










BovineHDl 100030149 


11 


100667966-103599900 


2 


OFIIB 


growth factor independent 1 1 
1 B transcription repressor 


103039731_103051510 


BovineHD1400 007259 


14 


24515640-26515640 


347 


PL AG! 


pleiomorphic adenoma 
gene 1 


14 


25007291_25009296 


BovineHDl 500003824 


15 


15384481 


1 










BovineHDl 600020726 


16 


72937754 


1 










ARS-BFGL-NGS-36082 


17 


55916203 


1 










BovineHDl 80001 0790 


18 


35817150 


1 


LCAT 


lecithin-cholesterol 
acyltransferase 


18 


35544370_3554761 1 


BovineHD2000001543 


20 


4510146-4917418 


40 










BovineHD21 00006256 


21 


19315388-21252249 


3 


PUN 


perilipin 


21 


21502826_21516686 


BovineHD21 0001 5006 


21 


52213249 


1 










BovineHD2500 008003 


25 


28785015 


1 










Group 2 


BovineHD0700028765 


7 


97444057-99791666 


48 


CAST 


calpastatin 


7 


98444979_98581253 


BovineHD0900015975 


9 


58434589-58447291 


2 










BovineHDl 000026655 


10 


92188144 


1 










BovineHDl 30001 3410 


13 


45909405-45911134 


2 










BovineHD2500 005078 


25 


18023277 


1 


ACSM 


acyl-CoA synthetase 
medium-chain family 
members (5, 2A, 1, and 3) 


25 


18207129_18656582 


BovineHD2600 007456 


26 


28012143-28015564 


3 










BovineHD2900012374 


29 


39901491-41901491 


16 


DAGLA 


diacylglycerol lipase, alpha 


29 


40857731_40883995 










FADS 


fatty acid desaturases 
(1,2, and 3) 


29 


40940932_41 102449 


BovineHD2900013185 


29 


43070926-45070926 


98 


CAPNl 


calpain 1, (mu/l) large 
subunit 


29 


44064429_44089990 










HBP 


fibroblast growth factor 
(acidic) intracellular 
binding protein 


29 


44665294_44669337 


Group 3 


BovineHDOl 0001 4747 


1 


51055086-52556409 


9 


MYH15 


myosin, heavy chain 15 


1 


53530575_53687569 


BovineHDOl 00024891 


1 


87504288 


1 


ACTL6A 


actin-like 6A 


1 


88129517_88160918 


BovineHD0300023058 


3 


79105316-81105316 


26 


LEPR 


similar to leptin receptor; 
leptin receptor 


3 


80071 689_80147000 


BovineHD0300027512 


3 


95779023-95808937 


3 
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Table 7. Cont. 





Plausible candidate genes 


Name of most sig. SNP 


BTA 


Position (bp)^ 


No. SNP^ 


Gene 


Gene name 


BTA 


Start and end position 


BTA-71063-no-rs 


4 


67772757 


1 










BovineHDOeOO 003224 


6 


12746688-12749442 


3 










BovineHD070001 4472 


7 


49991320 


1 


BT.33253 


myotilin 


7 


50941 047_50958425 


BovineHD0700027193 


7 


93075709-93076407 


2 










BovineHD1300010190 


13 


34099934-36099934 


33 


ZEBl 


Zinc Finger E-Box Binding 
Homeobox 1 


13 


34063175-34261299 


BovlneHDl 30001 8388 


13 


63952212-64544354 


4 


BT.86382 


growth differentiation 
factor 5 


13 


653401 32_65343889 


BovineHDl 30002041 5 


13 


71426238-71429231 


2 


LPm 


llpin 3 


13 


706661 41 _70683589 


BovlneHDl 70000701 6 


17 


24451063-24892281 


37 










BovineHDl 70001 6275 


17 


57243957-57313932 


2 


ARPC3 


actin related protein 2/3 
complex, subunit 3, 21 kDa 


17 


56573543_56584354 










BT. 105634 


myosin, light chain 2, 
regulatory, cardiac, slow 


17 


5695381 3_56961 603 


BovineHD1900 007319 


19 


25043894-25097956 


6 


BT.50868 


spermatogenesis 
associated 22 


19 


24809409_24824295 


BovineHD21 00006370 


21 


21632180-32790802 


2 


PUN 


perilipin 


21 


21502826_21516686 


BovineHD2500 004094 


25 


14547288 


1 


BFAR 


bifunctional apoptosis 
regulator 


25 


13640454_1 3672230 










BT. 100599 


myosin, heavy chain 11, 
smooth muscle 


25 


14218281_14343745 


Group 4 


BovineHDOl 00046441 


1 


44437965 


1 


C0L8A1 


collagen, type VIII, alpha 1 


1 


43541936_43717619 


BovineHD0400018623 


4 


67742508-67780601 


3 










BovineHD0400021462 


4 


77156871-77655595 


23 


iCFBP3 


insulin-like growth factor 
binding protein 3 


4 


76705105_76712709 


BovineHD0500030537 


5 


106417996-107475266 


10 


FCF 


fibroblast growth factors 
(6 and 23) 


5 


106157909_106216757 


BovineHD0700021658 


7 


72824476-73605795 


2 


BT.27996 


adrenergic, alpha-IB-, 
receptor 


7 


73611117_73672127 


BovineHD0700027239 


7 


92244933-94244933 


55 


ARRDC3 


arrestin domain 
containing 3 


7 


9324041 9_93253094 


BovineHD0700027869 


7 


95667222-95714161 


3 










BovineHD0800002287 


8 


7222286 


1 










BovineHD0800017674 


8 


59156184 


1 










BovineHD0800018611 


8 


62416119-62419858 


2 










BovineHD0900028542 


9 


99035883 


1 










BovineHD0900029128 


9 


100519621-100535463 


9 


PACRG 


PARK2 co-regulated 


9 


99649026_1 001 80707 


BovineHDl 1000301 34 


11 


103536624-103761234 


2 


CFIIB 


growth factor independent 1 1 
1 B transcription repressor 


103039731_103051510 










LCN9 


llpocalin 9 


11 


103327151_103328557 


BovineHDl 20001 31 80 


12 


47984330-47987280 


2 










BovineHDl 30001 8707 


13 


65917704 


1 


BT.86382 


growth differentiation 
factor 5 


13 


653401 32_65343889 










GHRH 


growth hormone 
releasing hormone 


13 


66863225_66872531 


BovineHDl 50001 4253 


15 


49447046 


1 










BovineHDl 50001 6882 


15 


58463005-58469454 


3 










BovineHDl 70001 51 95 


17 


53758523 


1 










BovineHDl 70001 7438 


17 


61218439-61227950 


3 


HRK 


harakiri, BCL2 interacting 
protein (contains only BH3 
domain) 


17 


60417162_60437117 



BovineHD21 00005354 21 18979918-19055070 11 
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Table 7. Cont. 



Plausible candidate genes 



Name of most sig. SNP BTA Position (bp)^ 



No. SNP" 



Gene 



Gene name 



BTA 



Start and end position 



BovineHD2200014376 
BovineHD2300012740 
BovineHD2500 002877 
BovineHD41 0001 8660 



22 
23 
25 
28 



50346158-50399252 
43906223-43925150 
10725405-10730359 
36064247-36104864 



For regions where several highly significant SNP were observed, the most significant SNP is presented; 
^Number of SNP that were significant at P<5xl0"^ within the specified region; sig. = significant. 
doi:l 0.1 371 /journal.pgen.l 0041 98.t007 



Multi-trait x^ = r', F"'?,-, 

where ti is a 32 x 1 vector of the signed t-values of SNPj for the 32 
traits ti is a transpose of vector i,- (1 x32) F"'' is an inverse of the 
32x32 correlation matrix where the correlation between two 
traits is the correlation over the 729,068 estimated SNP effects 
(signed t-values) of the two traits. 

This approximation method is justified as follows: t-values based 
on many degrees of freedom have an error variance close to 1 .0 
and t is distributed as a x (i) under the null hypothesis. Therefore, 
if the SNP effects on n different traits were estimated indepen- 
dently with no error covariance, the sum of the t^ (i.e., t^Iti, where 
/is an identity matrix) would be distributed as a chi-squared with n 
degrees of freedom. Our approximate analysis would generate 
exactly this test statistic if the t values for different traits had no 
error covariance. If the t values for different traits had an error 
(co)variance matrix D, then the correct test statistic would be 
t D^^t distributed as a chi-squared with n degrees of freedom. We 
approximate D by the correlation between the estimated SNP 
effects across the 729,068 SNPs. We assume that most SNPs have 
httle or no effect on most traits, so most of the (co)variance 
between effects is error covariance. However, the SNPs that do 
have a real effect on a trait will inflate the variance of SNP effects 
above 1.0. Therefore we convert the covariance matrix of SNP 
effects [D) to a correlation matrix [V) because this returns the 
diagonal elements to 1.0 which we know is the correct error 
variance for t statistics. Although it is not proof of the method, 
perhaps we offer the following intuitive analysis. If the SNP effects 
on different traits were estimated in independent GWAS then the 
correlation of SNP effects would be low and V~I and the test 
statistic would be the sum of independent chi-squares, as expected. 
On the other hand, if the SNP effects on different traits were 
estimated from the same individuals, then the correlation of error 
variances would be driven mainly by the phenotypic correlations 
between the traits. In this case the correlation of SNP effects would 
also reflect these phenotypic correlations and the test statistic we 
use would be a good approximation of the correct test statistic. 

Power of multi-trait meta-analysis to detect QTL 

False discovery rate (FDR) . The increase of power of QTL 
detection was investigated by comparing FDR calculated in multi- 
trait test with FDR calculated in single-trait GWAS. Following 
Bolormaa et al. [51], the false discovery rate was calculated as 



^ T 



number of SNP that were significant at the P -value tested and Tis 
the total number of SNP tested. 

Validation of SNP effects. To validate SNP effects in the 
multi-trait test, we developed a new approach that uses a hnear 
index of traits that had maximum correlation with the SNP. This 
new approach was carried out as follows: 1) Splitting data as 
reference and validation populations; 2) Predicting missing 
phenotypes using multiple regression approach; 3) Performing 
single-trait GWAS in the reference population to get the SNP 
effects based on only the reference population; 4) Calculating a 
linear index of 22 traits for each SNP, which had maximum 
association with the SNP in reference population; and 5) 
Validating SNP effects using GWAS to discover if there is any 
association between the corresponding linear index and SNP. 

1) Splitting data. For validation purposes, a 5-fold cross validation 
schema was carried out. The whole data were split into five 
sets by allocating all of the offspring of randomly selected sires 
to one of the five datasets. Then one of the 5 divisions was 
used as a validation population and the other 4 divisions as the 
reference population. In this way no animal used for 
validation had paternal half sibs in the reference population. 

2) Predicting missing phenotypes. The linear index on individual 
animals could only be calculated for animals with all traits 
measured. This required individual animal level data. 
Therefore this process was restricted to the 22 non- 
reproduction traits since the cows and bulls, on which the 
reproductive traits were measured, were not recorded for 
carcass traits. Even among these 22 traits, not all animals were 
measured for all traits. Before the missing phenotypes were 
predicted, the raw phenotypes for each trait were corrected 
for fixed effects using the following model: corrected 
phenotype = phenotype - fixed effects. So missing values were 
filled in by a prediction using multiple regression on the traits 
that were recorded on that animal. This multiple regression 
procedure uses the actual effects (not signed t values) of 
729,068 SNPs for 22 traits that were estimated based on all 
animals (reference and validation population) in order to have 
the same units with phenotype values. For each animal, SNP 
effects for the 22 traits were reordered so that those traits with 
a phenotypic value preceded those traits with missing values. 



■ where P is the P-value tested (e.g., 0.00001), A is the 



;i-p) 



Then the (co)variance matrix of SNP effects among the 22 

traits were calculated and inverted: t/^' = f 

where [f°is the inverse of (co)variance matrix of SNP effects 
between the traits with a phenotype value, Lf" is the inverse of 
(co)variance matrix SNP effects between traits with a missing 
record, and If'vs U"'is the inverse of (co)variance matrix of 
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Table 9. The accuracy of imputation (R^) obtained from Beagle of the genotyped data.^ 



Imputation 
/breed^ 


7 K data 




50 K data 




10 K data 




7 K to 50 K 


50 K to 800 K 50 K to 50 K 


50 K to 800 K 


3 K to 50 K 


50 K to 800 K 


AAMG 


0.89 


0.94 


0.98 


0.95 


0.88 


0.96 


BB 


0.78 


0.90 


0.96 


0.90 






BR 


0.80 


0.92 


0.98 


0.93 






BX 






0.95 


0.85 






HH 


0.75 


0.92 


0.97 


0.90 






SG 


0.75 


0.93 


0.94 


0.93 






ss 


0.87 


0.92 


0.96 


0.91 






TC 


0.76 


0.93 


0.96 


0.95 






Mean 


0.80 


0.92 


0.97 


0.90 


0.88 


0.96 



' =this table is sourced from [8]; 

^ = Angus (AA), Brahman (BB), Belmont Red (BR), Hereford (HH), recent Brahman crosses (BX), Murray Grey (MG), Santa Gertrudis (SG), Shorthorn (SS) and Tropical 
Composites (TC); AAMG = genotypes of AA and MG animals were imputed together. 
doi:1 0.1 371 /journal.pgen.l 0041 98.t009 



the traits with and without a missing record. The missing 
phenotypes (jj,,) were then predicted using the following 
formula: y„ = —(U"")^^ U"''yo, where j„ is a vector of the 
traits measured on a particular animal. 

3) GWAS in reference population. The individual trait GWAS and 
the multi-trait significance test on signed t-values described in 
the previous sections were performed using only the reference 
population. Only the most significant SNP from a sliding 
window of every 1 -Mb-interval was retained to avoid 
identifying a large number of closely linked SNPs whose 
association with traits is due to the same QTL or due to LD to 
an index SNP. If a SNP in each 1 -Mb-interval was significant 
at P<10~' then it was selected to be vahdated in the 
validation population using the linear index of 22 traits. 

4) Calculating linear index. A linear index (yjj of 22 traits that has 
maximum correlation in the reference population with each 
selected SNP was derived. This hnear index was calculated for 
each animal. The phenotype values and the effects of the SNP 
are used to calculate the linear index, so the actual effects of 
the SNP (not signed t values) were in the same units as the 
trait values. The following formula was used to calculate a 
hnear index: yi = b C^^y, where b' is the transpose of a 
vector of the estimated effects of the SNP on the 22 traits 
(1 x22) that was estimated from only the reference population, 
C~' is an inverse of the 22 x22 (co)variance matrix among the 
22 traits calculated from the estimated SNP effects of 729,068 
SNPs only in the reference population, and y is a 22 x 1 vector 
of the phenotype values for 22 traits for each animal in the 
validation sample. 

5) Validating SNP effects ming GWAS. The association between 
each linear index (jjj and each SNP was then tested in the 
validation population. The yi was treated as a new trait 
(dependent variable). The association was assessed by a 
regression analysis (GWAS) using the following model: j/~ 
mean + SNPi -I- animal + error, where animal and error were 
fitted as random effects and SNPi were fitted as a covariate 
one at a time (other fixed effects were removed from the trait 
measurements before forming the linear index). 

In order to see whether the SNPs validated in the validation 
population have the same direction of effects (positive or negative) 
as SNPs in the reference population, we repeated the steps 2, 4, 



and 5 by using the phenotypes of the reference population instead 
of the phenotypes of the vahdation population. Then the directions 
of SNP effects for the linear index in both reference and validation 
populations were checked and the proportion of SNPs whose 
effects were in the same direction in the reference population was 
calculated. 

Multi-trait meta-analysis tends to find SNPs near genes 

The gene start and stop positions were identified using Ensembl 
(www.ensembl.org/biomart/) and SNPs were classified according 
to their distance from the nearest gene. The SNPs were placed in 
bins 1) <100 kb upstream of the start site or downstream of the 
stop site, 2) 100-200 kb upstream or downstream, etc., in 100 kb 
bins. SNPs between the start and stop sites were placed in a 
separate bin (called 0 kb from the nearest gene). For each bin the 
proportion of SNPs that were significant [P< 10 ^) in the multi- 
trait analysis was divided by the total number of SNPs in that bin. 

Single-trait GWAS to test pleiotropy or linkage 

The SNP effects estimated from single-trait GWAS based on all 
animals were used to investigate the relationships between SNPs. 
For each pair of SNPs, the correlation of the effects across 32 traits 
was calculated. Highly positive or negative correlations indicate 2 
SNPs with the same pattern of effects across traits. 

Conditional analysis to test pleiotropy or linkage 

The 28 lead SNPs were selected as follows: On each 
chromosome the one or two most significant SNPs (P<10~''), 
based on the multi-trait analysis, were selected. Two SNPs on the 
same chromosome were only selected if they clearly represented 
two different QTL based on the test for pleiotropy vs linkage. In 
no case were the SNPs less than 2 Mb apart. 

The regression analyses (GWAS) were performed again but 
additionally the 28 lead SNPs were fitted simultaneously in the 
model. The statistical model used was trait ~ mean + fixed 
efiects + SNP; + leadSNPi + leadSNPj + leadSNP;, + ... + 



leadSNP,,, 



animal + error; with animal and error fitted as 



random efiects. The rth SNP (SNP;, i= 1, 2, 3, ... , 729068) and 
28 lead SNPs were fitted simultaneously as covariate effects. 
Then a multi-trait chi-squared statistic was calculated for each 
SNP to test the effects of the SNP across traits after fitting the 28 
lead SNPs. 



PLOS Genetics | www.plosgenetics.org 



21 



March 2014 | Volume 10 | Issue 3 | e1004198 



Multi-trait, Meta-analysis for GWAS 



Cluster analysis 

For each pair of SNPs among the 28 lead SNPs. the correlation 
of their effects across the 32 traits was calculated. Then this 
correlation matrix was used to do the hierarchical clustering of the 
28 lead SNPs leading to 4 groups or clusters as shown in the 
dendrogram drawn using the heatmap function of the R program 
[52]. 

Finding additional SNPs in the 4 groups defined by the 
cluster analysis 

For each of the 28 lead SNPs, we searched for additional SNPs 
with a similar pattern of effects. To do this we used the linear 
index that showed the highest association with a lead SNP, as 
previously defined for validation of the multi-trait analysis. A new 
GWAS was performed for each of 28 linear indexes (yj) treating it 
as a new trait (dependent variable). The following model was used: 
jij ~ mean -I- fixed effects -I- SNP; -I- animal -I- error, where animal 
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